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Abstract: We perform a global fit of the most relevant neutrinoless double beta decay experiments 
within the standard model with massive Majorana neutrinos. Using Bayesian inference makes it 
possible to take into account the theoretical uncertainties on the nuclear matrix elements in a fully 
consistent way. First, we analyze the data used to claim the observation of neutrinoless double beta 
decay in ''^Ge, and find strong evidence (according to Jeffrey's scale) for a peak in the spectrum and 
moderate evidence for that the peak is actually close to the energy expected for the neutrinoless 
decay. We also find a significantly larger statistical error than the original analysis, which we include 
in the comparison with other data. Then, we statistically test the consistency between this claim 
with that of recent measurements using ^^^Xe. We find that the two data sets are about 40 to 80 
times more probable under the assumption that they are inconsistent, depending on the nuclear 
matrix element uncertainties and the prior on the smallest neutrino mass. Hence, there is moderate 
to strong evidence of incompatibility, and for equal prior probabilities the posterior probability of 
compatibility is between 1.3% and 2.5%. If one, despite such evidence for incompatibility, combines 
the two data sets, we find that the total evidence of neutrinoless double beta decay is negligible. If 
one ignores the claim, there is weak evidence against the existence of the decay. We also perform 
approximate frequentist tests of compatibility for fixed ratios of the nuclear matrix elements, as 
well as of the no signal hypothesis. Generalization to other sets of experiments as well as other 
mechanisms mediating the decay is possible. 
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1 Introduction 

The well-established phenomenon of neutrino oscillations requires that at least two of the three 
neutrinos of the standards model are massive, and massive neutrinos could cause observable effects 
in other experiments. Electron spectra from beta decaying nuclei could be affected, but searches for 
such modifications have so far come up negative [1, 2]. Furthermore, cosmological observations can 
also provide information, although the constraints depend on which cosmological model is assumed 
[3, 4]. 

Since the neutrinos do not have any charges under known unbroken gauge symmetries, it is 
possible that the neutrinos are Majorana particles, i.e., their own antiparticles. The most efficient 
way of determining this is to search for a certain kind of nuclear decay in which a nucleus undergoes 
a double beta decay without emitting any neutrinos, so-called neutrinoless double beta decay. There 
are numerous particles in renormalizable models beyond the standard model which could mediate 
the decay [5], and one can also perform studies using effective field theories [6, 7]. However, the 
most commonly studied field mediating the decay are the standard model neutrinos with Majorana 
masses. In this case, neutrinoless double beta decay is sensitive to the effective mass given by 



\{M,)ee\ = 



= \m^c\^c\^ m2S?2C?3e''" + mas^ge'*''!, (1.1) 



where M.y is the Majorana mass matrix of the neutrinos and {A4i,)ee is its ee'th element. The to^'s 
are the masses of the mass eigenstate neutrinos. Cy- and Sy are the cosines and sines of the mixing 
angles Oij, and a and /3 are the so-called Majorana phases. The inverse half- life of a given nucleus 
N due to neutrinoless double beta decay is given by 

= Gjv|A^w|Ve, (1-2) 

with Gn a known numerical phase space factor. AAn is the nuclear matrix element (NME), which 
encodes the nuclear processes at work. Its calculation is a very difficult problem, and requires 
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certain approximations and assumptions to be made. There are different models and approaches 
used to calculate the NMEs, which have often given quite different results, although much progress 
has been made in recent years. As a consequence, the required values of the NMEs have rather large 
"theoretical" uncertainties, which needs to be taken into account when analyzing data within models 
beyond the standard model predicting the decay. For further information, see, e.g., Refs. [5, 8, 9]. 

There has been no clear detection of the neutrinoless decay generally accepted by the commu- 
nity, although a small subset of the Heidelberg-Moscow experiment, using ^^Ge as the decaying 
nucleus, did make such a claim [10]. This analysis was subsequently updated in Refs. [11, 12] (see 
also Ref. [13]). For many years, there was no other experiment with enough sensitivity to test these 
claims. Recently, however, the first data from a new generation of experiments has been released 
[14, 15], which have sensitivities in the same region of rriee-values preferred by the ^®Ge claim, 
and which show no evidence of the decay. However, these experiments use ^'^^Xe as the decaying 
nucleus, which has its own NME, also with a large uncertainty. In order to combine and compare 
these results, one first needs to chose a specific underlying mechanism predicting the decay, and 
take into account the uncertainties on both the NMEs. 

The aim of this work is to perform such an analysis within the standard model with massive 
Majorana neutrinos, and we choose to concentrate on a Bayesian analysis since this, in addition to all 
its usual advantages (see, e.g., Ref. [16]), allows us to take into account the large NME uncertainties 
in a fully consistent and statistically coherent way. Often, one is interested in obtaining constraints 
using a combination of multiple sets of data. However, one should first be convinced that the 
different data sets are actually consistent within the model under study. Testing the consistency 
is extra relevant in the case of neutrinoless double beta decay since the recent data seems to be 
rather inconsistent with the claim of Ref. [11]. Our analysis will take into account the statistical 
uncertainties in all the data sets as well as the theoretical uncertainties of the NMEs, neither of 
which was fully considered in the comparisons of Refs. [14, 15]. 

This work is organized as follows. In Sec. 2, we review the principles of Bayesian inference, 
with focus on Bayesian model selection and the compatibility test we will employ. Sec. 3 is an 
analysis of the data presented in Ref. [11], which is necessary in order to make the comparison with 
the other data sets possible. Sec. 4 is a description of the other data and likelihoods used in the 
analysis, while Sec. 5 describes the underlying model, its parameters, the treatment of the NME 
uncertainties, and the priors used. In Sec. 6, the results of the analysis are presented, and the 
conclusions can be found in Sec. 7. 



2 Bayesian inference 

In the Bayesian interpretation, probability is associated with degree of belief. This is in contrast to 
the frequcntist interpretation, in which probability is defined as the limit of the relative frequency 
of an event in a large number of repeated trials. 

If one accepts the Bayesian interpretation, a very powerful arsenal of inference tools become 
available. In essence, Bayesian inference is a framework for updating prior belief or knowledge 
based on new information or data. Generally, the probability Pr(A)i?) represents the degree of 
belief regarding the truth of A, given B. The order of the conditioning can be reversed using 
Bayes' theorem, 

, , Pr(B\A)PT(A) , , 

Pr(ALB = — ^ ' ■ (2.1 

^ ' ' Pr(B) ^ ' 

Perhaps the central purpose of science is to infer which model or hypothesis best, and usually 
most economically, describes a certain set of collected data. If the collected data is denoted by 
D and the set of plausible hypotheses the most straightforward Bayesian solution is to 
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simply use Bayes' theorem to calculate the posterior probability of each of the hypotheses, ""^ 

_ Pr(D|g.)Pr(g.) _ Pr(D|g,)Pr(g,) 

Here, -Z^ = Pr(D|_H'i) is the probability of the data, assuming the model Hi to be true, and is often 
called the evidence of the model Hi. Equation 2.2 can then also be written as 

Pr(g.|D)= ^ ^ ^..P.(H.) - (2-3) 

If one is not comfortable with assigning absolute probabilities to the different hypothesis, one can 
instead consider only the posterior odds, which is the ratio of the posterior probabilities, 

Pr(g,|D) _ Pr(D|gO Pr(g,) _ Pt{H,) 
Pr{H,\B) Pr(D|i7,) Pr(i/,) Z,Pr(i7,)' 

which implies that the posterior odds equals the prior odds (often chosen as unity) times the Bayes 
factor, the ratio of the evidences. This method of comparing models is usually called model selection, 
although model comparison or model inference might be more accurate descriptions in the case that 
no single model is actually selected. 

If the model H is simple, i.e., has no free parameters, then the evidence is simply the probability 
(density) of the data D when H is assumed to be true. If instead the model contains free parameters 
0, straightforward application of the laws of probability implies that the evidence is given by 

Z = Pi{B\H) = J Pr(D,0|i?)d^0 = J Pr(D|0,iJ)Pr(0|iJ)d^0 

= J /:(0)7r(0)d^0. (2.5) 

Here, the likelihood function C{&) = Pr(D|0,_ff) is the probability (density) of the data D, as- 
suming parameter values and 7r(0) = Pr(0|i7) is the prior probability (density), which should 
reflects one's degree of belief of the parameters, given the model and the background information 
but not the data. 

One observes that the evidence is the average of the likelihood over the prior, and hence this 
method automatically implements a form of Occam 's razor, since in general a more predictive model 
will have a larger evidence than a less predictive one, unless the latter can fit the data substantially 
better. Bayes factors or posterior odds are usually interpreted using Jeffrey's scale in Tab. 1^, as 
used in, for example, Refs. [16-20]. Note that probability itself implies a somewhat unique and 
meaningful scale of the evidence, and that Tab. 1 simply gives rough descriptive statements of 
posterior odds and probabilities. 

Note that it is often the case that the evidence is quite dependent on the prior used [18, 21], 
although the Bayes factor will generally favour the correct model once "enough" data has been 
obtained. Furthermore, when comparing nested models, taking the Bayesian view also means that 
the significance, or the "number of ct's", of a result is in general not a good indicator of the 
importance or the evidence of a new effect, a result that is known as "Lindley's paradox". For 
hu'ther details, see, e.g.. Appendix A of Ref. [18]. 

Once one or a set of models with large posterior probabilities has been found, the complete 
inference of the parameters of those models are given by the posterior distribution through Bayes' 



^AU probabilities are also implicitly assumed to bo conditioned on all the relevant background information /, i.e.. 
Pr(X) is written instead of Pr(X|/). 

■^We denote the natural logarithm as "log" and the base-10 logarithm as "Ig". 
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log(odds) 


odds 


Pr(-ffilD) 


Interpretation 


< 1.0 


< 3 : 1 


^ 0.75 


Inconclusive 


1.0 


-3:1 


~ 0.75 


Weak evidence 


2.5 


~ 12 : 1 


~ 0.92 


Moderate evidence 


5.0 


~ 150 : 1 


~ 0.993 


Strong evidence 



Table 1. Jeffrey's scale often used for the interpretation of Bayes factors, odds, and model probabilities. 
The posterior model probabilities for the preferred model are calculated by assuming only two competing 
hypotheses and equal prior probabilities. 



P„e|D, H, = ^-'°l^;g|^fl"' = ^M. (2.6) 

Since the evidence docs not depend on the values of the parameters 0, it is usually ignored in 
parameter estimation problems and the parameter inference is obtained using the unnormalized 
posterior. However, note that the evaluation of the posterior distribution of the parameters is 
only meaningful if the model does not have a very small posterior probability, since otherwise the 
model as a whole is strongly disfavored. In practise, this means that one should first calculate the 
evidences and posterior odds and only then, for the models with not to small evidences, calculate 
the posterior distribution. 

In fact, if more than one model has a significant probability, it is better to consider the dis- 
tributions of parameters not assuming the model with maximum probability to be correct, but 
instead take into account the uncertainty regarding which model is the correct one, giving the 
model- averaged distribution [21] 

r 

Pr{&\X) = Y,P^{®\H^,X)Pl■{H,\X), (2.7) 

1=1 

which is the probability distribution given by the average of the individual distributions over the 
space of models, with weights equal to the model probabilities. 

The main result of Bayesian parameter inference is the posterior and its marginalized versions 
(usually in one or two dimensions). However, it is also common to give point estimates such as 
the posterior mean or median, as well as credible intervals (regions), which are defined as intervals 
(regions) containing a certain amount of posterior probability. Note that these regions are not 
unique without further restrictions, just as for classical confidence intervals, and that in general 
they do not contain all the information that the posterior contains. 

Although the reasoning and techniques used when performing model selection are often different 
than when estimating parameters, one can equally well consider model selection as a parameter 
inference problem with an additional discrete parameter denoting the model index. Hence, there 
is no real "fundamental" difference between model selection and parameter estimation. We use 
MultiNest [22, 23] for the evaluation of all evidences and posterior distributions in this work. 

2.1 A Bayesian consistency test of observables 

One can also use Bayesian model selection to test if a set of data is consistent or not within a given 
model H [19, 24, 25]. First, one partitions the data as D = (latest, £'bkg), where we want to test the 
internal consistency of Dtest = (i^i,^2, • ■ • ,Dk), k > 2, given -Dbkg, a- set of possible background 
data, assumed to be correct and internally consistent. Let 

C: The data -Dtcst considered are all consistent within H, given the background data -Dbkg- 
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C: -Dtest are inconsistent and hence lead to different regions of parameter space being preferred, 
i.e., {Di, D2, . . . , Dk) need different sets of parameters to describe the data. 

We want to calculate the ratio of posterior probabilities of C and C (with implicit conditioning on 
H), given by 

Pr(C|Acst,Jbkg) ^ Pr(Acst|iJbkg,C) Pr(C|A.kg) ^ Pr(Acst|gbkg, C) Pr(C) 
Pr(C|Ac.t,^bkg) Pr(Acst|£'bkg,C)Pr(C|Z?bkg) Pr(Aest|^bkg, C) Pr(C) ' 

where we have in the last step have used that Pr(C|£'bkg)/ Pi'(C'l-Dbkg) = Pi'(C')/ Pr(C'), since the 
probability that latest is consistent should not change without considering it. From this also follows 
that PT{Dbi^g\C) = Pr(i:)bkg|C) = Pr(£)bkg)- The calculable part of Eq. (2.8) is the Bayes factor 



^ ^ Pr(Acst|iJbkg,C) ^ Pr(AcstliJbkg) ,2 9) 

Pr(Acst|i?bkg,C) ntiPi-(A|i?bkg)' 



where the last step follows from the defining property of the hypotheses C and C: the data in Z^tost 
can be described by the same parameters (of H) under C, but need different sets under C. 

These are in principle "ordinary" evidence integrals, with the exception that the prior used in 
the integral is conditioned on Dbkg, i-e-, it can be considered the posterior of the data -Dbkg- Hence, 

Pr(Acstli^bkg) = y"Pr(Aest|0)Pr(e|Z?bkg)d^0, (2.10) 

and similarly for the other evidences. The conditioning on Dbkg can be dropped in the likelihood 
since the probability distribution of the data -Dtcst does not depend on Z^bkg if all the free parameters 
are fixed. If there is no background data, one of course simply uses the "original" priors in the 
evidence integrals. 

However, the integral can be difficult to perform in practise if Pr(0|Dbkg) is not simple. Then 
one can use that Eq. (2.9) can be written as 

TZ^ Pr(Acst,jJbkg|g) p,(j^ (2.11) 
ntiPr(A,i^bkg|H) 

These evidences are the evidences using the original priors, but now also including the background 
data in the likelihoods. 

Although the expression for the Bayes factor in Eqs. (2.9) and (2.11) (and from that the 
posterior probability of consistency) is derived from probability theory, it can still be good to 
test it on problems where it is obvious what the result "must" be. This has been done on both 
simple and more advanced toy problems in Refs. [19, 24, 25] for the special case k = 2. As an 
analytical example, let Di and D2 result in likelihoods £i(0) and £2(0), respectively. If one 
of the likelihoods (say £2) is constant, and hence give no information on the parameter values, 
one should obtain TZ = 1 for any background data and £1, since in this case there is only one 
actual measurement and hence one cannot say anything about the compatibility. This is indeed 
what Eq. (2.11) reduces to. As a second example, consider the case when the second data set 
determines the model parameters exactly, £2(0) = S{@ — &o). In this case the compatibility test 
should be equivalent to testing if the first data set favors the null hypothesis = 0o or the more 
complex one with prior 7r(0), or 7r(0|Z3bkg) if background data is included. Indeed, one finds that 
n = Pi{Di\&o)/Pr{Di\H, i5bkg) = £i(0o)/ / £i(0)^(0|L'bkg)d^0. 

3 The data, likelihood, and evidence of the claim using ^^Ge. 

In order to be able to combine and compare the claim of Ref. [11] with other measurements, we will 
in this section reanalyze the data presented there. There, the authors presented a large number of 
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spectra for different event selection criteria. The choice of which of these to use for further analysis 
is rather arbitrary; we will use the spectrum in Fig. 9 c) of Ref. [11], which is for the combination 
of the two event selection methods used, and with the hardest cut on the allowed distance from the 
edge of the detector. The data, collectively denoted by Dce, is plotted in Fig. 1. 

First, the practise of trying to determine if there is a peak in the spectrum without explicitly 
considering the null hypothesis of no signal, i.e., implicitly assuming that there is a peak, is not 
very good. Instead, what one needs to do is to statistically compare the model without a peak with 
a model which has a peak. Actually, it seems too restrictive to assume that any possible peak can 
only come from a neutrinoless double beta decay signal, or something which looks similar. Hence, 
in addition to the background only model, we consider two signal hypotheses. First, a hypothesis 
which states that there is a peak somewhere in the spectrum, which could for example be a peak from 
some other radioactive decay with an energy which happens to fall inside the spectrum. Second, a 
hypothesis stating that there is a signal with the form of that expected from neutrinoless double 
beta decay, i.e., close to the Q-value Q = 2039.00 ± 0.05 kcV [26] of the decay. We assume that any 
peak has a true width smaller than the energy resolution of the detector. Hence, due to the finite 
energy resolution of the detector, the expected spectrum in the detector has a peak with the total 
number of signal events s spread over a distribution (usually taken as a Gaussian) with some width 
a. A related analysis of simulated spectra was performed in Ref. [27]. 

We thus want to compare the following hypotheses: 

Hq: There is only a constant background rate in each bin, which is a priori unknown. 

Hi : There is a peak somewhere in the spectrum with unknown position, together with a constant 
background rate. 

H2'. There is a peak with the properties which is expected of a neutrinoless double beta decay line, 
and a constant background rate. This means that the peak is centered close to the Q-value 
of the nucleus. 

This set of hypotheses could be extended with hypotheses allowing more than one peak in the 
spectrum, as well as one with a broader peak, but it should be clear from the spectrum in Fig. 1 
that these would be disfavored. 

A Bayesian analysis of the spectrum in Fig. 1 is in principle straightforward, but is nonetheless 
an interesting exercise. There is, however, a small complication in the present case. Since the event 
selection of Ref. [11] results in each event being assigned a weight generally different from unity, the 
resulting spectrum contains non-integer number of "measured events" . The probability distribution 
in each bin, given an expected number of events, is impossible to estimate, and one notices that 
there is a much larger probability to have a number of events close to an integer than far away from 
one. 

For a Bayesian analysis, however, we do not really need the distribution of the data (given 
values of the parameters), but only the relative probability of the data actually observed as a 
function of the free parameters. We expect this function, the likelihood, to be still reasonably well 
approximated by'^ 

"bins 

£gc(0)oc n A.(e)^-e-^'(®), (3.1) 

4 = 1 

where and Ni is the predicted and observed number of events in bin i, respectively. Under the 
null hypothesis A; = 6, while under the alternatives 

Ai(0) = s/' -^e-^^^dE + b, (3.2) 

''Note that Poisson likelihoods for non-integer data arc sometimes used in high-energy physics [28]. 



- 6 - 



> 

CD 



^ 3 



I Data 
■Orginal fit 

Maximum likelihood 

Posterior median, s ~ uniform 
- Posterior median, s ~ logarithmic 




2000 2010 2020 2030 2040 2050 2060 

E„/keV 



Figure 1. The data of Fig. 9 c) of Ref. [11], together with their fit, our maximum hkehhood, and posterior 
median estimates for both the uniform and logarithmic priors on s. 



Prior: 


log(Zi/Zo) 


log(Z2/Zo) 


l0g(Z2/Zi) 


s - LOG(0.1,30) 
s - LOG(10-^30) 
s - U(0,30) 


3.37 ±0.07 
2.55 ±0.07 
4.32 ±0.07 


5.81 ±0.06 
4.92 ±0.06 
6.38 ±0.06 


2.44 ±0.09 
2.67 ±0.09 
2.06 ±0.09 



Table 2. Logarithms of Bayes factors for different priors on the signal strength. The priors on the nuisance 
parameters are given in Tab. 4. 



where bin i is between iJ""" and E™'"^^, = {s, Eo,(T,b), and the signal strength, position of the 
peak, peak width, and background rate are denoted by s,i?o,c, and &, respectively. 



Prior: X ^ 


Description 


U(a,6) 
L0G(a,6) 

N(m,<7) 


Uniform between a and b 
log(X)^U(log(a),log(6)) 
Normal with mean fi and standard deviation a 
log(X)^N(log(Ai),log((7)) 



Table 3. Notation for priors. The uniform prior is zero outside the interval [a, b]. For the log-normal prior, 
the median (but not the mode or mean) of X is /i, while roughly 68.3% of the prior probability is contained 
in the interval [h/ct, fia]. 



The most important parameter in both Hi and H2 is the signal strength s. One might think 
that a uniform prior on s is the most natural choice. However, one might also be uncertain about 
the scale of the signal, in which case a prior uniform in log s would be more appropriate. In any 
case, one needs to specify an upper limit. Neutrinoless double beta decay has been searched for 



-7- 




20 2035 2040 0.5 1 0.4 0.8 
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Figure 2. Posterior distributions of the parameters. All the parameters plotted were assigned uniform 
priors. 



Model 






6- 


Ho 






U(0,.) 


Hi 




L0GN(1.5kcV, 1.3) 


U(0,.) 


H2 


N(2039 keV, 2 keV) 


L0GN(1.5keV, 1.3) 


U(0,.) 



Table 4. Priors on nuisance parameters. The notation using a "•" as in U(0, •) means that as long as that 
limit is chosen large (or small) enough so as to not cut off the posterior, its precise value does not matter. 



in germanium before [29] , the result of which one can include in the background information and 
implemented as a (quite conservative) upper limit of 30 events. The lower limit for the uniform prior 
is naturally taken to be zero, while for the log priors it should be smaller than about 1 event, but 
large enough to put a non- negligible amount of prior probability in the region s > 1. Alternatively, 
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one could assume that the signal would be the result of Majorana neutrino exchange, and use the 
external constraints from neutrino oscillation and tritium beta decay experiments that exist on 
niee- Since in this case Eq. (1.2) gives log s = C + 2 log + 2 logmee, log s would, apart from some 
smearing due to the uncertainty in the NME, have the same shape of its prior as mee. Reasonable 
priors for s would then be a logarithmically uniform distribution between roughly 10~* eV and 
30 eV for the normal mass ordering and 0.03 eV and 30 eV for the inverted (see Fig. 3). Hence, we 
use the three different priors of Tab. 2," while the meaning of all the priors used in this work can 
be found in Tab. 3. The data will also be analyzed within the full model of the standard model 
with massive neutrinos in Sec. 6. 

The main difference between hypotheses Hi and H2 is the prior knowledge put on the position 
of the peak, Eq. Under Hi, the most natural choice is to have a uniform prior between the endpoints 
of the spectrum, = 2000 keV and E"'^'' = 2060 keV. Under H2 one could simply fix Eq to the 

Q-value. However, since we by Eq mean the position of the peak in the observed spectrum rather 
then the true one, one should consider the possibility of a systematic shift of the position of the 
peak. Ref. [11] estimates the systematic uncertainty to about 1.2 keV, and in order to allow for this 
being an underestimation, we take a Gaussian prior of width 2keV, i.e., Eq ^ N(2039keV, 2keV). 
Using the information in Refs. [10, 11, 30, 31] we estimate the energy resolution to be roughly 
1.5 eV, but due to lack of detailed information of the experiment, we take a rather conservative 
uncertainty and a ^ L0GN(1.5keV, 1.3). The priors on the nuisance parameters are summarized 
in Tab. 4. Under all hypotheses, we take a simple uniform prior on the background rate b. The 
resulting posteriors and Bayes factors are insensitive to its upper limit as long as it is large enough, 
which is denoted by 6 ~ U(0, •). 

It is always wise to evaluate if the final inference is sensitive to the priors used. The sensitivity 
to the prior on the signal strength is shown in Tab. 2. For log priors with smaller lower limits, the 
signal hypotheses become more and more similar to Hq, and hence their evidences slowly approach 
those of Hq, but as we can see, this happens very slowly and for reasonable lower limits, this effect 
is very small and does not change any conclusions. For the nuisance parameters, we have the 
following. Fixing cr at 1.5 keV leaves the log evidence unchanged within the numerical errors of 
0.1, while doubling the uncertainty only lowers the log evidence by 0.3 ±0.1. In a similar way, one 
finds that lowering the prior uncertainty of Eq to 1.2 keV (the estimated systematic uncertainty) 
leaves the evidences invariant within the errors, while increasing it to, say, 3 keV, only decreases it 
by about 0.25 ±0.1. 

If one assigns equal prior probabilities to the three hypotheses, Eq. (2.3) together with the 
Bayes factors of Tab. 2 results in the posterior model probabilities 



The small variations of the posterior probabilities of Hi and H2 is because \og{Z2/Zi) is essentially 
determined by the difference in the prior on Eq and is hence rather independent of the prior on s. 

The marginalized posteriors of the parameters are shown in Fig. 3. They are obtained using 
the priors of Hi , but with the prior on a replaced with a log prior with a lower limit of 1 keV, 
allowing for wider peaks in the spectrum. The median and 68% central credible intervals for 
the signal strength are given by s = 12.2^3 g and s ~ 10.6^3 5, for the model H2 with uniform 
and logarithmic priors on s, respectively. However, both quite small and large values of s are 
allowed, with central 99% credible intervals given by [3.7,25.0] and [1.8,23.0] for the same two 

*The hard lower bound can be avoided by using a prior which decays smoothly, e.g., by taking a uniform distri- 
bution below the presently used lower limit, but this will not noticeably change any results. 



Pr(i7o|i5Gc) 
Pr(i/i|i?Gc) 
FriH2\DGc) 



(1.5-6.6) • IQ-^ 



(3.3) 
(3.4) 
(3.5) 



0.08-0.11 



0.89 - 0.92. 
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prior choices. The maximum hkehhood point and an approximate 68% confidence interval using 
the profile likelihood is s = 10.5^3 5. These signal rates and errors should be compared with the 
reported s ~ 10.75 ± 1.58 of Ref. [11]. In Fig. 1 arc also plotted the expected distribution of events 
for different point estimates of the parameters. Our analysis indicate that signal rates a factor of 
2 smaller than the best fit, and hence half-lives a factor of 2 larger, are allowed at a reasonable 
level. We believe that this larger error should be taken into account when comparing with other 
experiments, which has not been done in previous analyses such as Refs. [14, 15]. 

Finally, we comment on the significance of a signal using frequentist hypothesis test. The stan- 
dard hypothesis tests based on the profile likelihood ratio are not applicable because the nuisance 
parameters are not defined under the null hypothesis (see, e.g., [32, 33]). In principle one could 
neglect the uncertainties in a and Eq and fix them, but since the number of events are so small, 
the expected distribution would most likely be far from the asymptotic one anyway. In any case, 
we find 

o, X /sup„^ £(s = 0,PGe)\ 

^ - ^ ( Xl^oi ) ^ ''-'^ ^'-'^ 

with pGe = (Eq, (t, b) the nuisance parameters. Since the distribution of is not known, this 
number does not say very much, but for half a ^^-distribution with 3 and 1 degrees of freedom, the 
above value of would correspond to a significance of 3.7 and 4.4 standard deviations, respectively. 
Alternatively, since one knows the expected position of the peak one can also simply merge all bins 
in in the region 2039 ± 3 kcV (taking into account the energy resolution as well as the systematic 
uncertainty on Eq) into a single counting experiment. One can also imagine that one first fits the 
background and then fix it to b ~ 0.65. This gives a total of about Uq'^ = 15 observed events with 
an expected background of 4.6 events in the signal region. Hence, the p- value Pr(nGG > "-ccf l^o) — 
9 • 10~^, corresponding to a 3.7cr significance.^ In addition, the errors on the signal strength is 



compatible with the naive error estimate of Y??Gcf — ^^'^ ^^"^^ thai our Baycsian estimates 
agrees both with that based on the likelihood and that of naive counting of events, both regarding 
the point estimates and the errors, strengthens our belief that our analysis is robust. 

When the data is analyzed within the standard model with massive Majorana neutrinos in 
Sec. 6, the relation between the inverse half-life of Eq. (1.2) and the signal strength is obtained 
using the information of Ref. [11], which implies that s = e ■ 2.5 • (Tqc/IO^^ yr)~^, with e = 1. 
According to Ref. [34]^, this corresponds to assuming a signal detection efficiency of 100%, which 
is most likely not realistic. Note that a smaller value of epsilon would mean that, in order to 
reproduce the same signal rate, a larger decay rate, and hence a larger rriee (for fixed NME), would 
be required. We will, however, consider the choice e = 1 as part of the considered claim, and only 
briefly comment on what choosing a smaller e would imply for the results in Sec. 6. 



4 Other data and likelihoods 

The most relevant other data on neutrinoless double beta decay are the recent measurements using 
^^^Xe as the decaying nucleus [14, 15]. The EXO collaboration reported one observed event with 
4.1 expected background events within the ±lcr signal region [15]. Since the background should be 
well determined from data outside of this region, we simply take a Poisson likelihood 

Cexo « (sExo + 6EXo)'^""°e-(-^--°+''-'^°) (4.1) 

with TVexo = 1 and 6exo = 4.1. We neglect the uncertainty in 5exo (reported as 0.3), although it 
could easily be incorporated with negligible impact on the end results. The number of signal events 

^Ref. [34] calculated a 5cr significance using a similar method. However, there the observed data was used to select 
which bins to merge, leading to an overestimated significance. 

^There is was also pointed out that the error on the signal rate should be of the size we have found. 
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is given by 

Assuming that the decay is mediated by massive Majorana neutrinos, the half-hfe is again given by 
Eq. (1.2). 

In searches for rare processes in particle physics it is standard practise to not only report 
frequcntist upper limits at some fixed confidence level, but to also report information useful to 
the rest of the scientific community, making it possible to combine with other experiments and 
to analyze alternative models. For approximately Gaussian measurements such could be the the 
maximum likelihood estimate and its error, even if the estimate is for an unphysical value of the 
parameter. Unfortunately, the KamLAND-Zen Collaboration reports very little information in 
Ref. [14] in addition to the observed upper limit. However, with some additional assumptions, 
one can still obtain an approximate likelihood. Since the expected signal sits on top of a rather 
large background, the maximum likelihood estimate of the inverse half-life, call it /i, should be 
approximately Gaussian distributed around the true inverse half-life /x. Then, assuming that the 
collaboration calculated the upper limit using the profile likelihood with the best-fit constrained 
to be smaller than the tested value (so that only upper limits result, see Ref. [28]), one can use 
their stated 90% sensitivity (median upper limit under the background hypothesis) to estimate the 
standard deviation of the maximum likelihood estimate as a/^ ~ 7.8-10~^^ yr~^ ■ In order to estimate 
the observed maximum likelihood decay rate, we note that Ref. [14] states that 12% of hypothetical 
measurements are expected to yield smaller upper limits under the background hypothesis. Since 
the upper limits in this case are monotonic functions of the maximum likelihood estimates, one 
should have ~ — l.lTcr^. If one summarizes the data with the maximum likelihood estimate, 
one obtains a Gaussian likelihood with center p,°^^ and width afi as above. In any case, the final 
results should not depend significantly on reasonable variations of these numbers. Finally, in order 
to predict the decay rate in each nucleus in Eq. (1.2), one needs the phase space factors, which we 
take from Tab. 1 of Ref. [35], and the nuclear matrix elements, to be discussed in Sec. 5.1. The 
EXO and KamLAND-Zen data are collectively denoted by Dxc- 

Furthermore, we need the likelihoods of the additional constraints on the model resulting from 
other types of experiments, i.e., the background data Dbkg- These are taken as all neutrino oscil- 
lation data analyzed in Ref. [36] and tritium beta decay data [1, 2]. We do not use cosmological 
observations nor other double beta decay experiments because of the associated theoretical and 
model uncertainties. 

The neutrino oscillation likelihood is a function of the six oscillation parameters Rose = 
^23' '^13' Since the oscillation parameters (expect 6) are rather well constrained 
and the correlations between the oscillation in the standard parameterization are rather small, we 
use an approximation of the likelihood as 

6 

/:osc(e)^n^-c(Rosc), (4.3) 

i=l 

where 

Csc(RLc) = exp (^^9!&A^ . (4.4) 

We do not assume Gaussianity of the individual likelihoods, but instead use the functions (R^s^) 
as plotted in Fig. 2 of Ref. [36]. Inclusion of the likelihood constraining S has no effect on the 
results, since it is only constrained by the background data. Note that a study applying model 
selection to neutrino oscillation data has been performed in Ref. [37]. 
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Electron spectra from beta decays of certain isotopes are sensitive to the kinematic effective 
mass-square 

2 V^lrr |2 2 222|222|22 r A t:\ 

rrifi^ 2^ \Ue^\ mi = m^^c^^Cis + m^s-^^Ciz + m^^iz- (4-5) 

i 

The results most sensitive to are those of Mainz [1] and Troitsk [2], yielding approximately 
Gaussian likelihoods 

m\ = -1.2 ±3.04 eV^ (4.6) 
m\ ^ -0.67 ±2.53 eV^ (4.7) 

respectively. 

5 Parameter space and priors 

In this section we describe the space of parameters used and the priors imposed on them. Since 
the likelihood of the tested data only depends on the particle physics parameters through niee , one 
could in principle perform the analysis using only that parameter, together with the NNEs and the 
nuisance parameters of the likelihood of Sec. 3. However, since we want to take into account the 
non-trivial constraints from neutrino oscillation and tritium beta decay on rriee, we instead choose 
to work with the full set of parameters of the Majorana mass matrix in terms of the masses and 
mixing parameters 

0PF = (too, Ato2i, Am3i,6'i2,6'23,6'i3,(5, a, /3), (5.1) 

where toq is the smallest neutrino mass, Am^i — m\ — m\, and Am|]^ — to| — to^. Hence, the full 
set of 14 parameters becomes 

(0pf,A1go,A1xo,Pgo). (5.2) 

This approach implies that the results in principle depend on the assumed mass ordering of the 
neutrinos, which can be either normal (Ato|]^ > 0) or inverted (Ato^j^ < 0), but we will show that 
in practise the evidence of incompatibility between the data sets as well as the evidence for the 
existence of neutrinoless double beta decay does not, as long as the same priors are used for both 
mass orderings. 

The purpose and result of including the background data is essentially to restrict the distribution 
of TOee which is then used to analyze the neutrinoless double beta decay data. Since the parameters 
Am22, Ato§]^, 012, 023, and ^13 are very well-determined by oscillation data, their priors are rather 
irrelevant, and so we simply take uniform priors. Usually, one should not use the data to select 
priors for the parameters. In this case, however, one can get away with it as long as the posterior 
using Dbkg is the way we expect. The priors on the phases 5, a, (3 are taken uniform, since this is 
the only choice consistent with the symmetries of the mass matrix [38, 39]. 

The remaining particle physics parameter is toq, which together with the two NMEs are those 
of interest. The priors on these parameters are also those for which the final inference may depend 
significantly. We use two different priors on the lightest neutrino mass TOq. First, toq > can 
be thought of as parameterizing the scale of neutrino masses (at least down to toq 10~^ eV for 
the measured mass squared differences), and one could argue that it is most "natural" for all the 
neutrino masses to be roughly of the same order of magnitude, or at least not differ by more than 
one or two orders. In this case, TOq should not differ by many orders of magnitude from the mass 
squared differences, say toq ~ 10""^ — leV. Hence, we take a log prior toq ~ LOG(10^'^ eV, •),^ 
which we call prior A. Second, in order to consider the possibility of having more prior probability 

'^The hard lower bound can again be avoided by using a prior which decays smoothly below 10"^ without altering 
the results. 
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put on larger masses, we take a prior proportional to Xj ^ttlq with lower limit 0, which is denoted by 
B. We do not use a uniform prior on jtiq, simply because we believe it to not reflect the view of the 
community.^ Note that increasing the prior upper limit on mg does not change any results, since 
this region is anyway excluded by the background data. Equivalently, the corresponding reduction 
of the evidences will cancel in the Bayes factors. 

5.1 The nuclear matrix element uncertainties 

If the nuclear matrix elements needed to predict the expected rate of neutrinoless double beta 
decay in Eq. (1.2) were known accurately, combining and comparing the different data sets would 
definitely be easier. Of course, one could always perform such an analysis for fixed ratios of the 
NMEs. However, then it is difficult to specify how much those would be allowed to vary. Instead, it 
is better if one can incorporate the NME uncertainty into the analysis from the very beginning. But 
since the NMEs are in no way measured, one cannot include a factor in the likelihood constraining 
them, and, in general, an analysis of experiments using J different nuclei have J+1 free parameters. 

Incorporating the NME uncertainties in a Bayesian analysis is in principle straightforward, since 
one can simply use the NME calculations (as well as the properties of the different calculational 
methods) as prior information. This should yield at least a somewhat well-specified prior probability 
distribution of the NMEs. A unimodal prior with a width (which can be varied) parameterizing 
the uncertainty seems a natural choice. The NMEs are then included as free parameters and 
marginalized over in the end. 

How does one then quantify the uncertainty in the NMEs? Models used to calculate the 
NMEs have constantly been improving leading to more accurate results, and compilations of recent 
representative calculations using different methods have been given, for example, in Refs. [5, 35]. 
In addition, there might be reasons, based on the properties of the methods and the assumptions 
used, to believe that some methods are likely to underestimate the NMEs, while some methods 
are probably overestimating them. This motivated the authors of Ref. [35] do define "physics- 
motivated" ranges of the NMEs for different nuclei. 

Furthermore, if many methods tend to under- or overestimate the NMEs in one nucleus, it is 
likely that they also do so in other. In other words, the NMEs of different nuclei should be positively 
correlated a priori. Most important for the comparison of experiments in two different nuclei is 
the ratio of NMEs, since a rescaling of both NMEs could be compensated by a change in in 
Eq. (1.2). Since any biases in the calculations of the NMEs in different nuclei is expected to cancel 
in their ratio to some degree, it seems most straightforward to use the uncertainty in the ratio to 
specify the priors. 

From Refs. [5, 35] we conclude that it is reasonable to assign A^gc and A^xo the same rel- 
ative uncertainties, and we take the marginalized priors of the NMEs as log-normal, A^xo ~ 
LOGN(toxc, o"^) and A^gc ^ LOGN(mGc, <^m)-, with mxc ~ 2.8 and togc = 4.1 as best estimates. 
If logAlxe and log AlGe then are jointly normally distributed with correlation coefficient p, then 
the ratio r = AIxc/AIgo ^ LOGN(toxo/"1go, f^) also has a log-normal prior with Or = a^^^'''. 
Hence, given as input the two uncertainties ctm and (7^, one can calculate p and the two-dimensional 
prior of the NMEs is defined. The correlations between the different NMEs within one method of 
NME calculations have been studied in Ref. [40], with which our correlations will roughly agree. 

Following the discussion in Ref. [35], one could be optimistic regarding our current knowledge 
and take ~ 1-15. The 95% central credible intervals for the NMEs are then [3.1, 5.4] for Mgc 
{A = 76) and [2.1,3.7] for Mxc {A = 136), which can be compared with Fig. 1 of Ref. [35]. 
However, considering the possibility that such a small error is too optimistic, one should also take a 

*One would have to believe that it is a priori equally probable for mo to be, say, in the interval [0, 0.01] eV as in 
the interval [1, 1.01] eV. 
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more conservative value of the uncertainty such as aM = 1-3, giving 95% central credible intervals 
as [2.5,6.9] and [1.7,4.7], respectively. The case of no NME uncertainty, aj^ = 1, is obviously 
unrealistic, but is included for comparison. 

The ratio of the five estimates complied in Ref. [35] lie in the range r = 0.54 — 0.90, while many 
of the ratios of Ref. [41] are smaller, around 0.45. In principle, assuming that the calculations of log r 
are independent of each other when conditioned on r^, have the same errors, and are not biased in 
any direction, one could use Bayes theorem to obtain the posterior of r (posterior to the calculations, 
but prior to the data). With a log- normal density (uncertainty CTcomp) of the calculations and with 
and a log prior on r, one obtains a log-normal posterior with median equal to the mean of the five 
calculations and uncertainty parameter cr^ = ctco^- For (Tcomp — 1-25 (consistent with the values 
of Ref. [35]), is only about 1.1. However, since we realize that the assumptions going into this 
reasoning might not be completely valid, we also take a more conservative value = 1.25. The 
prior 95% central credible interval for r is then [0.44, 1.06], we we think is wide enough. However, 
for comparison, we also leave room for even more conservatism, and sometimes also consider the 
uncertainties {(JMi'^r) = (1.5,1.35), for which the 95% central credible intervals for A^go, -Mxc, 
and r are [1.8,9.1], [1.3,6.2], and [0.38,1.23], respectively. 



6 Results 

In this section we perform the combined analysis of the relevant data Dqc, Dxo, and i'bkg within 
the standard model with massive Majorana neutrinos. We especially emphasize the constraints and 
evidence from the neutrinoless double beta decay data Dq^ and Dxc- 

However, before simply combining all the data to yield the final model and parameter inference, 
one should be convinced that the data are actually mutually consistent. We thus first want to test if 
the two sets of data in Dtcst = {Dqc, Dxc) are consistent, when i3bkg is included as prior constraints. 
As discussed in Sec. 2.1. the Baycs factor of compatibility vs. incompatibility is 

^ ^ PiiDGc,Dxc\Dbks,H) ^ Pr(i^Ge, Dxc, ^bkgjg) Pr(ZJbkg|g) , 

Pr(7^Gc|i?bkg,ii')Pr(i?Xc|i?bkg,ii') Pr(i^Gc,i?bkg|i?)Pr(Z?xc,i?bkg|i?)' ^ ' ' 

Note that any experiment yielding no evidence of neutrinoless double beta decay can never be "more 
inconsistent" with Dgo than the degree of which I?Ge is incompatible with the null hypothesis of 
no signal, regardless of which measure is used for the inconsistency. For the test of Eq. (6.1) one 
would have 

Pr(L>Gc|i?o) 
^- Pr(i^Gc|i?bkg,i?)- 

When the upper limit on mee from the data used to test -Dgg decreases, this inequality approaches 
an equality, as was discussed in Sec. 2.1. As will be discussed in Sec. 6.1, this lower limit is about 
—5 and —6 for \ogTZ for priors A and B, respectively. 

In principle, one could also evaluate a Bayesian version of a p-value, encoding how "extreme" 
or "surprising" the observed data Dqc is in the light of Dxc (or the other way around). This 
can be done including the uncertainties of the model parameters (including parameters with only 
theoretical uncertainties). For example, the probability that one would observe an equal or larger 
number of signal events tt-go than was actually observed, given Dxc and -Dbkg, is 

Pr(nGc > n°^^^\Dxc, Dy,i,g, H) = ^ Pr(nGc > ng,^]©, i/) Pr(©ll?xc, i^bkg, i^)d^0. (6.3) 

Since the signal rate for a non-zero rriee is always larger than zero, it holds that Pr(nGc > 
rioJf ]Dxg, ^bkg7 ^) > Pr(7T.Go > ^Gcfl-^o) — 10""', as was estimated in Sec. 3. Hence, Dgc can 



^This is probably not true since some of the methods share common features. 
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r = 


0.4 


0.5 


0.6 


0.7 


0.8 


0.9 


1.0 


Against compatibility (cr's) 
Against niee — (cr's) 


2.2 
3.1/3.9 


2.6 
2.8/3.6 


3.6 
2.4/3.2 


3.4 
2.0/2.9 


3.7 
1.6/2.5 


3.9 
1.2/2.1 


4.1 
0.8/1.7 



Table 5. Rough estimates of the significance when testing the compatibihty of the two neutrinoless double 
beta decay data sets for different fixed values of the NME ratio r, and the significance when testing mee = 
against rriee > for the combination of data. 

never be more surprising under H than under iJo, no matter how small the upper limit on is. 
As expected, we find larger p- values for the prior B on mo and for larger NME uncertainties. For all 
priors on mo and the NMEs (except the most conservative with (ctx, Gr) = (1.5, 1.35)), we obtain 
(for fixed background rate h) Pr(nGc > n°^^\D^c, I^bkg, H) between 2.8 • 10"^ (3.4cr) and 1.8 • 10"^ 
(2.9(t). However, p- values are not directly related to the actual probability of the hypothesis being 
tested, and are not even the probability of the observed data, but a probability of data which has 
never been observed. Hence, rather than going to much into detail about these, we will concentrate 
on the consistency of the parameter constraints using the test based on model selection of Sec. 2.1. 

If one fixes the ratio r of the NMEs, both Dqc and Dxo essentially constrain only mee. In 
this case, one could perform rough frequentist hypothesis tests using the likelihood ratio. To 
test the compatibility, one can calculate the ratio of the full likelihood when maximized over mee 
(and pgc) to that obtained when maximizing the individual likelihoods separately. ^'^ In fact, the 
ratio bears some resemblance to Eq. (6.1), but with the individual likelihoods maximized, rather 
than integrated, over the parameters (and the background data ignored). Although in this case we 
cannot know the precise distribution of the likelihood ratio, minus two times of its logarithm should 
asymptotically have a x^-distribution with one degree of freedom. Converting the observed value 
of the likelihood ratio into a rough estimate of the significance yields the results of the first row of 
Tab. 5 for various values of r. As expected, the significance increases significantly when r increases. 
Testing the hypothesis mee by calculating the likelihood ratio similar to Eq. (3.6), but now also 
including Dxo, yields the results in the second row of Tab. 5, where the two values correspond to 
an assumed x^-distribution with one and three degrees of freedom, respectively.^"'^ As one observes, 
the results are very dependent on the chosen value of r. Hence, we prefer to concentrate on the 
Bayesian analysis, where the NME uncertainties are instead integrated over. 

The different evidences in Eq. (6.1) can be associated with different posterior distributions: 
the posteriors using i) only background data, ii) background together with one set of tested data, 
iii) background together with the other set of tested data, and iv) background together with both 
sets of tested data. The posteriors using only the background data of the logarithm of ra^e for 
both priors on mo and both mass orderings are displayed in Fig. 3. All the posteriors are correctly 
normalized with respect to each other. Note that the posterior of rUee is independent of the mass 
ordering for mee ^ 0.1 eV. The peaks appearing for small mee arc due to the "band structure" of 
the constraints in the mo - mee plane (see, e.i?.. Fig. 2 of Ref. [8]). Putting more prior probability 
on small values of mo will result in these peaks being more pronounced, but that mee is always 
bounded from below (in a probabilistic sense) for both mass orderings. 

In the same manner, the posteriors for normal mass ordcring^^ when also using Dqc, Dxc, and 
all data, are shown in Fig. 4, with too ^ A in the left panel and niQ ^ B in the right, and for the 

^"This is essentially the method of Ref. [42] extended to general likelihoods. The maximum of the Xe likelihood is 
taken at rriee = and not for a negative value. 
^^Of eourse, the same caveats as in Sec. 3 applies. 

^^For the inverted mass ordering, the posteriors differ in the region of small rriee, but are essentially identical in 
the interesting region of large rriee- 
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Figure 3. Posteriors using only background data of the logarithms of mee for different priors and assumed 
mass orderings. 



case of fixed NMEs and the prior with {aMjO'r) = (1.3,1.25). 95% central credible intervals for 
ruee using is [0.22 cV, 0.42 cV] for fixed NMEs and [0.18 eV, 0.51 eV] for cr^ = 1.3, which on 
a logarithmic scale is about a factor of 1.6 larger. Hence, the statistical and NME uncertainties 
are roughly of the same size. One notes that there is some overlap between the posteriors using 
Dxc and Dq^. However, it is only the marginalization down to one dimension which is visible, and 
this does not tell you how much the posteriors in the full parameter space are overlapping. There 
are posterior correlations between all the three parameters (or mo), A^gc and A^go, since 
there is an a posteriori correlation between niee and the NME corresponding to the data used, and 
since that NME is also a priori correlated with the other NME. The test using Eq. (6.1) takes into 
account all the constraints in the full parameter space. 

To show the constraints in two dimensions, we draw 20000 points from the full posterior and 
show their distribution in the {mee, ■Mqc) plane in Fig. 5, using mp ~ A (top) and toq ^ (bottom) 
and {cTM^'^r) = (1.3,1.25). In order to increase readability, points below mee — 0.03eV are not 
shown. One can see that there is a small amount of posterior using Dxc (green pluses) that is 
overlapping with the region preferred by Dce (black boxes) around mee — 0.3 eV, as well as some 
posterior using Dqc which is found in the region mee ^ 0.1 eV. The posteriors in the (rUee, -Mxe) 
plane are quite similar. There is a slightly larger overlap in the high-mge region for mo ~ B, while 
there is a smaller overlap in the region mee ^ 0.1 eV. These effects will party cancel each other, 
making the compatibility test less dependent on the prior on mo. 

Our results for the logarithm of the Bayes factor TZ are summarized in Tab. 6, which all have 
statistical errors of 0.07 at one standard deviation. In general, the obtained values only depends 
weakly on the choice of priors within the set we consider, and is between moderate and strong 
against compatibility, according to Jeffrey's scale. As expected, the evidence is independent of 
the absolute NME uncertainties, but does depend on the uncertainty of the ratios of the NMEs. 
Furthermore, using prior B yields smaller evidence against compatibility, but only slightly so, and 
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Figure 4. Posteriors for the logarithm of rriee for normal mass ordering for different data sets and for both 
fixed NMEs and conservative priors with [oMyO'r) = (1-3, 1.25). The prior on mo is A (left) and B (right) 
and -Dbkg is always used. 



(ctm, (Tr) = 


(1,1) 


(1.15,1.1) 


(1.3,1.1) 


(1.3,1.25) 


(1.5,1.35) 


mo ^ A 
mo ~ B 


-4.42 
-4.42 


-4.31 
-4.35 


-4.31 
-4.28 


-4.06 
-3.68 


-3.66 
-3.28 



Table 6. Calculated values of the logarithms of the Bayes factor TZ for different priors on mo and the 
NMEs. All statistical errors on the numerical estimates are 0.07. 

only for the larger NME ratio uncertainty. Fixing the NMEs is clearly to restrictive, but essentially 
equivalent results are obtained for Ur = 1.1. In addition, the evidence against compatibility does 
not decrease too much when using the very large NME uncertainties {(JM^'^r) = (1.5,1.35). In 
total, considering the most realistic cases by excluding both the results for fixed NMEs and for the 
largest uncertainties, we obtain 

log 7^ e [-4.35,-3.68], 7^"l e [40, 80]. (6.4) 

Hence, one can say that the data Dqc and Z?xe are about 40 to 80 times more probable under the 
hypothesis that they are incompatible, and hence need different sets of parameters to describe the 
data. If one assigns equal prior probabilities to the two hypotheses, one obtains 

Pr(C|I?Gc,i?Xc,A.kg) ^ 1.3% -2.5%. (6.5) 

Lowering the limit on mo for the log prior A will decrease the evidence against compatibility, but 
for reasonable lower limits (say, above 10~^), this effects is very small. In addition, one can consider 
putting some small but finite prior probability at mo = 0, but since this will also only change the 
results marginally we do not consider this possibility further. 

In the end of Sec. 3, it was noted that the efficiency e might be smaller than one, but that 
we have simply included e = 1 as part of the claim of Ref. [11]. A smaller e would require larger 
niee to fit the data, which would essentially mean that the black curves and points of Figs. 4 and 5 
would be shifted lg(e)/2 log-units to the right, and decreasing e can thus only increase the degree 
of incompatibility with Dxc, independent of how that incompatibility is evaluated. As a result, the 
evidence of the existence of decay for the full set of data is expected to decrease. 
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Figure 5. Equally weighted samples from the posterior using Dqc (green pluses) and Dxc (black squares) 
for the prior A (top) and B (bottom) on mo and {(JM,o'r) = (1-3, 1.25). In total 20000 points were drawn, 
but some points are outside the plot ranges. 



We have tested the compatibility of the different sets of neutrinoless double beta decay exper- 
iments with Dbkg used as prior constraints on the model parameters. In principle, one could also 
try to perform the same test while ignoring the background data. In this case, however, the prior 
on niee would be very difficult to specify, and this would make the results of the compatibility tests 
very prior dependent. However, we think that the relevant question is instead what we have inves- 
tigated in this work, i.e., the consistency of the different neutrinoless double beta decay data sets. 
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within the standard model with massive Majorana neutrinos together with all its already existing 
constraints. 

Finally, we note that we have checked that we obtain reasonable results if various inputs to 
our analysis are changed. For example, removing the constraints of I?xe by fixing A^xo to a very 
small value, we find that the evidence against compatibility disappears (TZ ~ 1), and if we remove 
the signal from Fig. 1, we obtain a value of TZ just above unity. If we instead inject a signal in 
Dxc consistent with niee — 0.3 eV and A^xc — 2.8 (with similar width as the real likelihood), we 
do obtain weak to moderate evidence in favor of compatibility. 

6.1 Evidence of the decay from different data sets 

One can also evaluate the evidence of the decay from Dgc as in Sec. 3 but within the full model with 
massive neutrinos, i.e., with the signal strength being derived from the particle physics parameters 
and A^Gc- Again, using I?bkg to constrain the model parameters, one obtains the Bayes factor 

^ Pt{Dg,\H, ZJbkg) ^ PrpGc,^bkg|g) , . 

Pr(Z?Gc|i?o,i?bkg) Pr(Z?Gc|i?o)Pr(^bkg|i?)' ^ ' ' 

since the conditioning on Dbkg is irrelevant under Hq. We find log^Go — 5.2 (6.0) for the prior A 
(B).^^ Once again, the inclusion of the background data eliminates the dependence on the upper 
limit on the prior on niQ. These are very similar to the values found in Tab. 2 when the signal 
strength s was the free parameter. It is, as expected, independent of the NME uncertainties and 
the assumed mass ordering. The latter is because, although the ranges of the priors on niee are 
different, the priors in the region mee > 0.1 eV are the same (see Fig. 4) for both mass orderings. 

Although there is substantial evidence against the data sets being compatible, one can disre- 
gard this fact and still calculate the total evidence of neutrinoless double beta decay. When all 
experiments are considered, we obtain the Bayes factor 

^ PTjDxcDGclH, i^bkg) ^ Pr(Jxc,iJGc,^bkg|g) , . 

Pr(i^xc,i^Gc|i?o,i^bkg) Pr(Z?xc,i^Ge|i?o)Pr(i^bkg|i?)' ^ ' ' 

For all combinations of priors on toq and the NMEs (excluding the most conservative NME priors), 
we find logStot = 0.5 — 0.8, which is no evidence or only very weak. The exception is for mo ~ B 
and (crjVi,CTr) = (1-3, 1-25), for which logStot — 1-33. The reason for this is that the larger NME 
errors allows I?gc and Dxc to be better fitted simultaneously while the prior on toq puts more 
prior probability in the region preferred by the combination of data. Our rough estimates of the 
frequentist significances for fixed NME ratios was discussed earlier and are summarized in Tab. 5. 

If one believes that the evidence against compatibility between the data sets is best treated by 
simply ignoring the claim of Ref. [11], one obtains the evidence for neutrinoless double beta decay 
as 

_ Pr(i?xc|g,i^bkg) _ PijDxc, Dbi.,\H) 

Pr(i?xc|i?o,i?bkg) Pr(i^xc,|i^o)Pr(i?bkg|i^)■ ^ ' ' 

We find logSco — —0.3 (—0.95) for the for prior A (B). In other words, prior B puts more prior in 
the large signal regions, which, when these are subsequently excluded by the data, leads to stronger 
evidence against the signal hypothesis. 

7 Summary and conclusions 

Due to the large theoretical uncertainties of the calculated nuclear matrix elements, a statistical 
analysis of neutrinoless double beta decay experiments within the standard model with massive 



'AH logarithms of Bayes factors in this section have a numerical uncertainty of 0.1 or smaller. 
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Majorana neutrinos, or any other model predicting the decay, is not straightforward. We have 
chosen to perform a Bayesian analysis, which, in a addition to all the usual advantages, makes it 
possible to take these uncertainties into account in a statistically coherent manner. 

From the analysis of the data used to claim the observation of neutrinoless double beta decay 
in ^^Ge we find strong evidence in favor of a peak in the spectrum and moderate evidence that 
the peak is actually close to the energy expected for the neutrinoless decay. We also find a lower 
significance and a significantly larger statistical error than the original analysis, which we have 
taken into account when comparing with the other data. 

Before one combines all the data to yield the final constraints on the models and their param- 
eters, one should first test if the different data sets are mutually compatible. We have performed 
such a test of the consistency of the claim in ^^Gc with the recent measurements using ^'^®Xe, within 
the standard model with massive Majorana neutrinos, and found that the two data sets are about 
40 to 80 times more probable under the assumption that they are incompatible, depending on the 
nuclear matrix element uncertainties and the prior on the lightest neutrino mass. In other words, 
there is moderate to strong evidence of incompatibility, and for equal prior probabilities, the poste- 
rior probability of compatibility is between 1.3% and 2.5%. The results are only weakly dependent 
on the choice of priors and NME uncertainties. If one, despite such evidence for incompatibility, 
combines the two measurements, we find that there is no significant evidence of neutrinoless double 
beta decay. If one ignores the claim, there is weak evidence against the existence of the decay. 
We have also performed approximate frequentist tests of the compatibility of the two sets of ex- 
periments, assuming different fixed ratios of the nuclear matrix elements, and have found that the 
results depend strongly on the value of the NME ratio, as expected. 

In addition to the experiments using ^'^^Xe as the decaying nucleus used in this work, there are 
other experiments utilizing other nuclei expected to deliver data in the near future. This includes 
GERDA [43] which is searching for the decay of ^^Ge, enabling a comparison with the claim of 
Rcf. [11] without needing to consider the NME uncertainties. In order get the most information out 
of the experiments on possible models generating the decay, combined analyses within those models 
should be performed. The analysis of this work could then be extended to other combinations 
of experiments, as well as different mediating mechanisms. If GERDA were to find evidence of 
neutrinoless double beta decay, while Xenon-based experiments continue to decrease the upper 
limits on the ^'^^Xe decay rate, one would have to consider other particle physics models as possible 
sources of the decay, and an extension of the Bayesian analysis presented here could be used to 
differentiate between such models. 
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